Skip to content

From NVIDIA Megatron-LM for visibility #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4,946 commits into
base: multi-query-attention
Choose a base branch
from

Conversation

RaymondLi0
Copy link
Collaborator

No description provided.

@RaymondLi0 RaymondLi0 changed the base branch from multi-query-attention to before-merge June 20, 2023 20:12
@RaymondLi0 RaymondLi0 changed the base branch from before-merge to multi-query-attention June 20, 2023 20:12
ko3n1g and others added 28 commits June 11, 2025 02:53
Expose TE fused MLP with module spec

See merge request ADLR/megatron-lm!3384
Moe inference functional tests

See merge request ADLR/megatron-lm!3403
ci: Benchmark release tests suite with TE2.2 on H100

See merge request ADLR/megatron-lm!3458
Move data to GPU for TP data processing

See merge request ADLR/megatron-lm!3371
Signed-off-by: oliver könig <[email protected]>
…nd fix shape mismatch between vision and language transformer

Co-authored-by: Gao Deng <[email protected]>
Co-authored-by: Gao Deng <[email protected]>
Optimize dummy weight tensors for cudagraph and fix shape mismatch between vision and language transformer

See merge request ADLR/megatron-lm!3366
Add --enable-experimental to args.

See merge request ADLR/megatron-lm!3377
perf(MLA): MLA down proj switch back to TELinear

See merge request ADLR/megatron-lm!3281
ci: Retry on network errors

See merge request ADLR/megatron-lm!3463
Co-authored-by: Oliver Koenig <[email protected]>
Co-authored-by: Guyue Huang <[email protected]>
Co-authored-by: Guyue Huang <[email protected]>
Add TE functional tests

See merge request ADLR/megatron-lm!3361
Signed-off-by: oliver könig <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Signed-off-by: oliver könig <[email protected]>
…ns to CPU

Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Added support for offloading Swiglu activations to CPU

See merge request ADLR/megatron-lm!3024
ko3n1g and others added 30 commits July 15, 2025 00:09
Force inference to always gather logits with tensor parallelism

See merge request ADLR/megatron-lm!3442
Only run prefill for requests that do not generate tokens

See merge request ADLR/megatron-lm!3499
Co-authored-by: Cyril Meurillon <[email protected]>
Co-authored-by: Cyril Meurillon <[email protected]>
Enable reruns by default

See merge request ADLR/megatron-lm!2739
…lidation feature

Co-authored-by: Ye Yu <[email protected]>
Co-authored-by: Chenhan Yu <[email protected]>
Clean up ModelOpt finetune scripts and add validation feature

See merge request ADLR/megatron-lm!3268
Fix typo in parallel_state expert parallelism

See merge request ADLR/megatron-lm!3548
Fix cuda graph logic to determine first/layer layers per stage in flexible pp layout

See merge request ADLR/megatron-lm!3505
Remove extra barrier in checkpoint flow

See merge request ADLR/megatron-lm!3626
Fix error when TE is not installed

See merge request ADLR/megatron-lm!3625
…itializations and associated weight decay skipping.
Adding support for Spike No More embedding initializations and associated weight decay skipping.

See merge request ADLR/megatron-lm!3500
MiMo video VLM train example

See merge request ADLR/megatron-lm!3543
ci: Retry on `free(): invalid pointer`

See merge request ADLR/megatron-lm!3632
Co-authored-by: Keshav Santhanam <[email protected]>
Co-authored-by: William Dykas <[email protected]>
Add Dynamic Backend Inference Tests

See merge request ADLR/megatron-lm!3475
…ate loading with PP>1 to ensure bit-wise match after saving and loading.
fix(distckpt, moe): Fix distckpt optimizer state loading with PP>1 to ensure bit-wise match after saving and loading.

See merge request ADLR/megatron-lm!3394
tests: Fix segfaults (maybe?)

See merge request ADLR/megatron-lm!3605
Fix mrope with context parallel

See merge request ADLR/megatron-lm!3603
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.